Skip to content

Comments

swev-id: scikit-learn__scikit-learn-26194 | ROC: dtype-aware sentinel threshold for float scores; tests for probability/integer cases#56

Open
rowan-stein wants to merge 2 commits intoscikit-learn__scikit-learn-26194from
fix/roc-threshold-sentinel
Open

swev-id: scikit-learn__scikit-learn-26194 | ROC: dtype-aware sentinel threshold for float scores; tests for probability/integer cases#56
rowan-stein wants to merge 2 commits intoscikit-learn__scikit-learn-26194from
fix/roc-threshold-sentinel

Conversation

@rowan-stein
Copy link
Collaborator

Overview

This PR addresses Issue #53: #53

Problem: roc_curve can return a threshold > 1 when y_score are probability estimates in [0,1], due to prepending a sentinel threshold using max(y_score) + 1.

Fix

  • In sklearn/metrics/_ranking.py::roc_curve, replace the prepended threshold logic with dtype-aware handling:
    • For float scores: use np.nextafter(thresholds[0], np.inf, dtype=thresholds.dtype) to set a value strictly greater than the max score, minimally exceeding the domain when max == 1.0.
    • For integer scores: retain max + 1 behavior.
  • Docstring updated accordingly.

Non-regression tests

Added to sklearn/metrics/tests/test_ranking.py:

  1. test_roc_curve_thresholds_probabilities_below_one — when max(y_score) < 1, the first threshold is the next representable float above the max and thresholds remain within [0,1].
  2. test_roc_curve_thresholds_probability_one — when max(y_score) == 1.0, the first threshold is np.nextafter(1.0, +inf); mapping remains consistent.
  3. test_roc_curve_thresholds_integer_scores — for integer/non-probability scores, the first threshold is max(score) + 1 and monotonic decrease holds.

Reproduction steps and observed failure (pre-fix)

  • Command used (minimal selection):
pytest sklearn/metrics/tests/test_ranking.py -k "thresholds_probabilities_below_one or thresholds_probability_one or thresholds_integer_scores"
  • Pre-fix observed failure excerpt:
============================= test session starts ==============================
platform linux -- Python 3.11.14, pytest-9.0.2, pluggy-1.6.0
collected 207 items / 204 deselected / 3 selected

test_ranking.py FFF                                                      [100%]

=================================== FAILURES ===================================
______________ test_roc_curve_thresholds_probabilities_below_one _______________
...
E       assert 1.9 == 0.9000000000000001 ± 9.0e-07
...
__________________ test_roc_curve_thresholds_probability_one ___________________
...
E       assert 2.0 == 1.0000000000000002 ± 1.0e-06
...
___________________ test_roc_curve_thresholds_integer_scores ___________________
...
E       assert 4 == (np.int64(3) + 1)
...
====================== 3 failed, 204 deselected in 0.65s =======================

Validation (post-fix)

  • The same test selection passes locally after the fix.

Notes

  • CI will not be manually triggered per task rules; this PR targets branch scikit-learn__scikit-learn-26194 and will remain unmerged pending review.
  • This change preserves API semantics: thresholds remain strictly decreasing; the mapping score >= thresholds[i] remains valid.

PR created per request; title includes the required token.

…fter; keep +1 for integers. Add non-regression tests for probability and integer cases. Ref #53
@rowan-stein rowan-stein requested a review from a team December 26, 2025 13:32
Copy link

@noa-lucent noa-lucent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the fix and the new coverage. I found one blocker: the new ROC threshold tests ended up duplicated later in the file, so the second set overwrites the first and leaves redundant code behind. Please drop the duplicate block so each test is declared exactly once, then we can take another pass.

Copy link

@noa-lucent noa-lucent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The duplicate ROC threshold tests are gone, and the dtype-aware sentinel change plus the remaining tests look correct. Thanks for the quick cleanup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants